AITopics

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
Europe > Netherlands > South Holland > Dordrecht (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(7 more...)

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Moein Falahatgar, Mesrob I. Ohannessian, Alon Orlitsky, Venkatadheeraj Pichapati

The power of absolute discounting: all-dimensional distribution estimation

Neural Information Processing SystemsNov-21-2025, 06:52:09 GMT

Neural Information Processing Systems http://nips.cc/

absolute discounting, machine learning, natural language, (18 more...)

Country:

North America > United States > Michigan > Wayne County > Detroit (0.04)
North America > United States > Maryland (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceNov-19-2025

Leveraging LLM-based agents for social science research: insights from citation network simulations

Ji, Jiarui, Lei, Runlin, Pan, Xuchen, Wei, Zhewei, Sun, Hao, Lin, Yankai, Chen, Xu, Yang, Yongzheng, Li, Yaliang, Ding, Bolin, Wen, Ji-Rong

The emergence of Large Language Models (LLMs) demonstrates their potential to encapsulate the logic and patterns inherent in human behavior simulation by leveraging extensive web data pre-training. However, the boundaries of LLM capabilities in social simulation remain unclear. To further explore the social attributes of LLMs, we introduce the CiteAgent framework, designed to generate citation networks based on human-behavior simulation with LLM-based agents. CiteAgent successfully captures predominant phenomena in real-world citation networks, including power-law distribution, citational distortion, and shrinking diameter. Building on this realistic simulation, we establish two LLM-based research paradigms in social science: LLM-SE (LLM-based Survey Experiment) and LLM-LE (LLM-based Laboratory Experiment). These paradigms facilitate rigorous analyses of citation network phenomena, allowing us to validate and challenge existing theories. Additionally, we extend the research scope of traditional science of science studies through idealized social experiments, with the simulation experiment results providing valuable insights for real-world academic environments. Our work demonstrates the potential of LLMs for advancing science of science research in social science.

citation network, large language model, machine learning, (20 more...)

2511.03758

Country: Asia > China (0.29)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Rahma Chaabouni, Eugene Kharitonov, Emmanuel Dupoux, Marco Baroni

Anti-efficient encoding in emergent communication

Neural Information Processing SystemsOct-2-2025, 12:03:48 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, communication, machine learning, (19 more...)

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceSep-3-2025

Exploring and Reshaping the Weight Distribution in LLM

Ye, Chunming, Li, Songzhou, Xu, Xu

The performance of Large Language Models is influenced by their characteristics such as architecture, model sizes, decoding methods and so on. Due to differences in structure or function, the weights in different layers of large models have varying distributions. This paper explores the correlations between different types of layers in terms of weights distribution and studies the potential impact of these correlations on LoRA training effectiveness. Firstly, the study reveals that in the model the cosine distances between weights of different layers manifest power-law distribution. We extract Query-projection, down-projection and other weight matrices from the self-attention layers and MLP layers, calculate the singular values of the matrices using singular value decomposition, and organize a certain number of singular values into matrices according to projection's type. By analyzing the probability distribution of the cosine distances between these matrices, it is found that the cosine distances values between them have distinct power-law distribution characteristics. Secondly, based on the results of distance calculations and analysis across different layers of model, a qualitative method is proposed to describe the distribution characteristics of different models. Next, to construct weights that align with the distribution characteristics, a data generator is designed using a combination of Gaussian process and Pareto distribution functions. The generator is used to simulate the generation of data that aligns with specific distribution characteristics. Finally, based on the aforementioned distribution characteristics and data generation method, the weights in LoRA initialization are reshaped for training. Experimental results indicate that, without altering the model structure or training process, this method achieves a certain improvement in the performance of LoRA training.

distribution characteristic, large language model, machine learning, (21 more...)

2509.00046

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Neural Information Processing SystemsAug-15-2025, 18:27:25 GMT

70596d70542c51c8d9b4e423f4bf2736-Paper-Conference.pdf

artificial intelligence, machine learning, representation, (18 more...)

Country:

North America > Canada > Quebec > Montreal (0.29)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-9-2025, 06:01:30 GMT

Distributed Power-law Graph Computing: Theoretical and Empirical Analysis

Cong Xie, Ling Yan, Wu-Jun Li, Zhihua Zhang

With the emergence of big graphs in a variety of real applications like social networks, machine learning based on distributed graph-computing (DGC) frameworks has attracted much attention from big data machine learning community. In DGC frameworks, the graph partitioning (GP) strategy plays a key role to affect the performance, including the workload balance and communication cost. Typically, the degree distributions of natural graphs from real applications follow skewed power laws, which makes GP a challenging task. Recently, many methods have been proposed to solve the GP problem. However, the existing GP methods cannot achieve satisfactory performance for applications with power-law graphs. In this paper, we propose a novel vertex-cut method, called degree-based hashing (DBH), for GP. DBH makes effective use of the skewed degree distributions for GP. We theoretically prove that DBH can achieve lower communication cost than existing methods and can simultaneously guarantee good workload balance. Furthermore, empirical results on several large power-law graphs also show that DBH can outperform the state of the art.

artificial intelligence, data mining, machine learning, (18 more...)

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsJan-22-2025, 17:50:18 GMT

Reviews: Anti-efficient encoding in emergent communication

This paper provides a focused study of the distribution of message lengths in an emergent communication task. A Lewis-type signaling game is constructed in which referents are generated from a power-law distribution. RNN "speaker" and "listener" models are constructed to communicate via a discrete channel (with variable vocabulary size and max length) and trained to maximize success at the signaling game using a vanilla policy gradient algorithm. It is observed that more frequent referents are associated with *longer* messages from the speaker agent. This is in contrast to natural language (exemplified by corpus data from English and Arabic and two simple computational models).

emergent communication, natural language, referent, (9 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.61)

arXiv.org Artificial IntelligenceDec-23-2024

Understanding Artificial Neural Network's Behavior from Neuron Activation Perspective

Zhang, Yizhou, Sui, Yang

This paper explores the intricate behavior of deep neural networks (DNNs) through the lens of neuron activation dynamics. We propose a probabilistic framework that can analyze models' neuron activation patterns as a stochastic process, uncovering theoretical insights into neural scaling laws, such as over-parameterization and the power-law decay of loss with respect to dataset size. By deriving key mathematical relationships, we present that the number of activated neurons increases in the form of $N(1-(\frac{bN}{D+bN})^b)$, and the neuron activation should follows power-law distribution. Based on these two mathematical results, we demonstrate how DNNs maintain generalization capabilities even under over-parameterization, and we elucidate the phase transition phenomenon observed in loss curves as dataset size plotted in log-axis (i.e. the data magnitude increases linearly). Moreover, by combining the above two phenomenons and the power-law distribution of neuron activation, we derived the power-law decay of neural network's loss function as the data size scale increases. Furthermore, our analysis bridges the gap between empirical observations and theoretical underpinnings, offering experimentally testable predictions regarding parameter efficiency and model compressibility. These findings provide a foundation for understanding neural network scaling and present new directions for optimizing DNN performance.

artificial intelligence, machine learning, neural network, (15 more...)

2412.18073

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceOct-9-2024

Dense Optimizer : An Information Entropy-Guided Structural Search Method for Dense-like Neural Network Design

Tianyuan, Liu, Libin, Hou, Linyuan, Wang, Xiyu, Song, Bin, Yan

Dense Convolutional Network has been continuously refined to adopt a highly efficient and compact architecture, owing to its lightweight and efficient structure. However, the current Dense-like architectures are mainly designed manually, it becomes increasingly difficult to adjust the channels and reuse level based on past experience. As such, we propose an architecture search method called Dense Optimizer that can search high-performance dense-like network automatically. In Dense Optimizer, we view the dense network as a hierarchical information system, maximize the network's information entropy while constraining the distribution of the entropy across each stage via a power law, thereby constructing an optimization problem. We also propose a branch-and-bound optimization algorithm, tightly integrates power-law principle with search space scaling to solve the optimization problem efficiently. The superiority of Dense Optimizer has been validated on different computer vision benchmark datasets. Specifically, Dense Optimizer completes high-quality search but only costs 4 hours with one CPU. Our searched model DenseNet-OPT achieved a top 1 accuracy of 84.3% on CIFAR-100, which is 5.97% higher than the original one.

dense optimizer, entropy, optimization, (14 more...)

2410.07499

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Asia > China > Henan Province > Zhengzhou (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)